A sustainable and secure load management model for green cloud data centres

The massive upsurge in cloud resource demand and inefficient load management stave off the sustainability of Cloud Data Centres (CDCs) resulting in high energy consumption, resource contention, excessive carbon emission, and security threats. In this context, a novel Sustainable and Secure Load Management (SaS-LM) Model is proposed to enhance the security for users with sustainability for CDCs. The model estimates and reserves the required resources viz., compute, network, and storage and dynamically adjust the load subject to maximum security and sustainability. An evolutionary optimization algorithm named Dual-Phase Black Hole Optimization (DPBHO) is proposed for optimizing a multi-layered feed-forward neural network and allowing the model to estimate resource usage and detect probable congestion. Further, DPBHO is extended to a Multi-objective DPBHO algorithm for a secure and sustainable VM allocation and management to minimize the number of active server machines, carbon emission, and resource wastage for greener CDCs. SaS-LM is implemented and evaluated using benchmark real-world Google Cluster VM traces. The proposed model is compared with state-of-the-arts which reveals its efficacy in terms of reduced carbon emission and energy consumption up to 46.9% and 43.9%, respectively with improved resource utilization up to 16.5%.

The major challenge entangled with developing such a solution is the trade-off about the contradictory objectives during load management. Undeniably, the cloud service provider aspires to maximize the revenues by distributing maximum workload on the minimum number of active servers to exhilarate energy efficiency and reduce power consumption costs while ignoring the security aspects during load execution. Such a distribution of resources allows multiple users to share the common physical machines and accelerates the probability of security breaches on VMs executing the workload of different users. Contrary to this, energy efficiency of the cloud environment descends and carbon footprint emission rises if the CSP minimizes sharing of the physical servers to strengthen the security of users' workload.
In view of the aforementioned context, this article proposes a novel Secure and Sustainable Load Management (SaS-LM) Model to minimize the security threats, power consumption, and carbon emission and maximize server resource utilization and PUE. This model analyses cloud workload in anticipation while addressing different resource utilization on virtual machines and manages the entire load while considering multiple factors related to security and sustainability. It employs a Multi-layered Feed Forward Neural Network (MFNN) as a workload analyser which is optimized by a newly developed Dual-Phase BlackHole Optimization (DPBHO) algorithm. Further, a secure and sustainable VM placement (VMP) is presented for optimized allocation of physical resource among VMs to serve the perspectives of both cloud user and service providers while procuring sustainability of CDCs. For the cloud users, it ingrains the secure placement of VMs by minimizing the probability of security breaches and reduces the operational cost of CDC for service provider by maximizing server resource utilization and minimizing power consumption. Also, the sustainability of the cloud environment is enhanced by improving power usage effectiveness and minimizing carbon footprint intensity.
The key contributions of the proposed work are fivefold: • MFNN-based cloud workload resource usage analyser is developed to forecast resource usage in real-time with enhanced accuracy which triggers load shifting to alleviate the effect of over/under-load on the server before its actual occurrence and improve performance of CDC. • A novel DPBHO algorithm is proposed for optimization of MFNN during cloud resource usage estimation. It is further extended to a multi-objective DPBHO (i.e., M-DPBHO) for placement of VMs subject to multiple constraints and objectives. • Secure and sustainable VMP is proposed to procure sustainability, energy consumption and security of CDC, simultaneously serving the perspectives of both service provider as well as end-user. • It facilitates the secure execution of user applications by minimizing the resource sharing among users of common physical server machines in real-time. • The experimental simulation and evaluation of the proposed model by using a real benchmark dataset reveal that the proposed work outperforms state-of-the-art approaches in terms of various performance metrics.
The rest of the paper is organized as follows: Section "Results" discusses experimental set-up and results of workload prediction, resource utilization, power consumption, sustainability, security, and trade-off among the obtained results. The proposed method is discussed in Section "Method" includes Dual-phase Black-Hole Optimization, cloud workload usage analysis, secure and sustainable VM placement, and VM management and SaS-LM operational summary. The background and related discussion is given in Section "Background and discussion". Finally, Section "Conclusion and future work" entails conclusive remarks and future scope of the proposed work.

Results
The simulation experiments are executed on a server machine assembled with two Intel ® Xeon ® Silver 4114 CPUs with 40 core processors and a 2.20 GHz clock speed. The server machine is deployed with 64-bit Ubuntu 16.04 LTS, having main memory of 128 GB. The data centre environment included three different types of servers and four types of VMs configuration shown in Tables 1 and 2. The resource features like power consumption ( PW max , PW min ), MIPS, RAM, and memory are taken from real server IBM 11 configurations where S 1 is 'Pro-LiantM110G5XEON3075' , S 2 is 'IBMX3250Xeonx3480' and S 3 is 'IBM3550Xeonx5675' . The VMs configuration is inspired by the VM instances of the Amazon website 12 . Table 3 shows the experimental set-up parameters and their values.
Google Cluster Dataset (GCD) is utilized for performance estimation of SaS-LM and comparative approaches which contains resources CPU, memory, disk I/O request and resource usage information of 672,300 jobs executed on 12,500 servers for the period of 29 days 13 . The CPU and memory utilization percentage of VMs are obtained from the given CPU and memory usage percentage for each task in every five minutes over period of twenty-four hours.  Table 4 reports the performance metrics: MAE ( ̟ MAE ), MSE ( ̟ MSE ), PUE, carbon footprint rate (CFR), resource contention rate (RCR ), probability of co-residency threats ( ), power consumption (PW), resource utilization (RU), the number of VM migrations (Mig#), and SLA violation ( SLA V ) achieved for GCD workloads for varying sizes of the data centre (200-1000 VMs) over 400 minutes.
The accuracy of forthcoming workload estimation using the proposed DPBHO optimized MFNN prediction unit governs the performance of the SaS-LM model. The average of failure prediction errors ̟ MAE and ̟ MSE   www.nature.com/scientificreports/ vary from 0.093 to 0.0126 and 0.0090 to 0.0006, respectively. The value of PUE is observed in the range 1 and 1.4 which signifies the sustainable efficiency of SaS-LM. The values of CFR vary in line with the power consumption (PW) which increase with the increasing size of the data centre. The value of PW depends on the workload execution and the number of active servers at a specific instance. Hence, PW changes non-uniformly over the observed period. The RCR varies non-uniformly for the various sizes of data centre. The resource utilization is obtained closer to 80% which is independent of the size of the data centre. The number of VM migrations and SLA violations vary according to the variation of the workload i.e., the number of over-/under-loads experienced over a continuous period. Figure 1 plots the actual versus predicted normalized values of CPU and memory usage achieved via multiple resource prediction using MFNN, wherein the predicted values lie closer to or overlaps the actual values revealing its efficacy in terms of prediction accuracy. The proposed work is compared for different performance metrics with various state-of-the-art approaches including Slack and Battery Aware placement (SBA) 14 , Static THReshold with Multiple Usage Prediction (THR-P) and Dynamic threshold based on Local Regression with Multiple Usage Prediction (LR-P) 15 , Previously Co-located User First (PCUF) 16 , Prediction based Energy-aware Fault-tolerant Scheduling (PEFS) 17 , Online VM Prediction based Multi-objective Load Balancing (OP-MLB) 18 , Boruta-forest optimization based Multiobjective Job Scheduling (BM-JS) 4 , VM placement with Online Multiple resources-based Feed-forward Neural Network (OM-FNN) 19 , Secure and Multi-objective VM placement (SVMP) 20 , and Wiener filter Prediction with Safety Margin (WP-SM) based VM allocation 21 . The concise description of all these approaches is provided in the discussion of Background and Table 5 presents a comparison of key performance indicators of proposed framework versus comparative approaches.    Fig. 2b which reveals a prediction accuracy ( Acu Pr %) trend: SaS-LM ≥ OP-MLB ≥ PEFS ≥ tri-adaptive differential evolution based neural network (TaDE-NN) ≥ auto-adaptive differential evolution based neural network (AADE-NN). The convergence capability of the proposed DPBHO algorithm while optimizing neural network based predictor, is compared with that of AADE 18  Resource utilization. Figure  Sustainability. Figure 5a compares the average percent of active servers of SaS-LM with the related approaches. The number of active servers for SaS-LM are observed in the range [18-40%] which are reduced by 8.45%, 1.5%, 33.8%, 6.25%, and 43.5% against THR-P, SBA, BM-JS, OP-MLB, and WP-SM, respectively. The gen-  www.nature.com/scientificreports/ eration of carbon foot-print ( CFR CDC (Kg/KWH)) is observed inline with the consumption of power as depicted in Fig. 5b, where the CFR CDC is compared over a periodic interval of 400 mins for CDC of size 600 VMs. SaS-LM has reduced the CFR CDC up to 21.2% and 46.9% against OP-MLB and SaS-LM − , respectively. Further, the rate of resource contention realized for the related approaches is compared in Fig. 5c. The rate of failure of resources is below 4% for SaS-LM during all the experimental cases. Also, the rate of contention of physical resources is reduced up to 95.4%, 92.8%, and 89.4% over PEFS, OM-FNN, and OP-MLB, respectively.

SaS-LM
The reason behind this performance improvement is the accurate estimation of required resources due to employment of proposed DPBHO for optimization of MFNN to allow intuitive pattern learning. Furthermore, to be acknowledged that the proposed multi-objective DPBHO has selected the most admissible VM placement strategy to enhance the resource utilization and minimize the power consumption by reducing the number of active servers while maintaining the resource availability constraints. Figure 6        www.nature.com/scientificreports/ Trade-offs. There are noticeable trade-offs among resource utilization, power consumption, sustainability, and security during load management. The consolidation of VMs on a minimum number of physical machines reduces the consumption of power and wastage of resources which leads to reduced carbon footprint emissions. However, the probability of security threats increases with high virtualization and sharing of physical resources because of the multi-tenant environment. Furthermore, to enable smaller power consumption, the entire load must be allocated on the minimum number of servers which may incur resource contention among VMs and degrades security and overall performance. Hence, the sustainability improves at the cost of security at the resource management level unveiling a high contradiction between the two objectives.

Method
A Sustainable CDC infrastructure is organized utilizing P servers { S 1 , S 2 , …, S P } located within n clusters { CS 1 , CS 2 , …, CS n }, powered by Renewable Source of Energy (RSE) and grid via battery energy storage system as illustrated in Fig. 7. The electric power produced by multiple RSE such as solar panels, wind energy, and power grid charge battery storage including Uninterruptible Power Supply (UPS) which is discharged to provide required power supply and backup to clusters of servers www.nature.com/scientificreports/ physical resource management such as handling of over-/under-loading of servers, VM placement, VM migration, scheduling etc. RMU is obliged for two-phase scheduling including (i) distribution of job requests { 1 , 2 , …, M } among VMs and (ii) placement of VMs { V 1 , V 2 , …, V Q } on servers. Accordingly, it assigns job requests { 1 , 2 , ..., M } among VMs corresponding to the user specified resource (viz., CPU, memory, bandwidth) capacity. Further, it appoints a multi-objective load balancing optimization for allocation of users' VMs { V 1 , V 2 , ..., V Q } to available physical servers { S 1 , S 2 , ..., S P } subject to security and energy-efficiency.
A Cloud Workload and Resource Usage Analyser (CW-RUA) is employed to estimate the workload and physical resource usage proactively and assist RMU by providing useful knowledge of resource provisioning in anticipation. CW-RUA captures the historical and live traces of resource utilization by VMs { V 1 , V 2 , ..., V Q } hosted on different servers { S 1 , S 2 , ..., S P } within clusters { CS 1 , CS 2 , ..., CS n }. The workload and resource usage analysis is performed in two steps: (i) Data preparation and (ii) Predictor optimization which are executed periodically. Data is prepared in the form of a vector of learning window using three consecutive steps including aggregation of resource usage traces, rescaling of aggregated values, followed by normalization. The learning window vector is passed to a neural network-based predictor which is trained/optimized with the help of a novel DPBHO evolutionary optimization algorithm. The detailed description of DPBHO, CW-RUA and Secure and Sustainable VMP (SS-VMP) is elucidated in Sections "Dual-phase black-hole optimization", "Cloud workload resource usage analysis" and "Secure and sustainable VM placement", respectively.

Dual-phase black-hole optimization. A two-phase population-based optimization algorithm named
Dual Phase Black-Hole Optimization (DPBHO) is proposed, wherein each phase, the candidate solutions are considered as stars while a star with the best fitness value is observed as a black-hole. Figure 8 portrays the DPBHO design which incorporates three consecutive steps: (i) Local population optimization, (ii) Global population optimization, and (iii) Position Update.
Local population optimization. In this phase, the stars i.e., random solutions { ξ 1 , ξ 2 , …, ξ N }∈ E are organized into K clusters or sub-populations, each of size N/K . All the members of each cluster ( ξ k i : i ∈ [1, N/K], k ∈ [1, K] ) are evaluated over training data using fitness value ( f k i ) obtained by computing Eq. (1), where F(ξ k i ) is a fitness evaluation function. The best solution of each k th cluster is considered as its local blackhole ( ξ k Lbest ) such that Global population optimization. In the global optimization phase, all the local blackholes consitute the second phase population { ξ 1 Lbest , ξ 2 Lbest , …, ξ K Lbest }, wherein heuristic crossover is performed to raise diversity of the sec- www.nature.com/scientificreports/ ond phase population by producing new individuals with a superior breed. In the course of heuristic crossover, stars act as chromosomes, where two parent chromosomes are randomly chosen and their fitness values are compared to find out the parent with better fitness value. Afterward, a new offspring is produced with the combination of two parent chromosomes using Eq. (2) which is closer to the parent having better fitness value 24 . This additional step brings significant diversity in the search space by adding new and better individuals in the second phase population. Let ξ k Lbest and ξ j Lbest be two parent chromosomes, wherein ξ k Lbest is considered as a parent chromosome with better fitness value. Thereafter, the offspring ξ Off is generated as follows: where, Cr i is a randomly generated crossover rate in the range [0, 1] for i th gene such that i = {1, 2, . . . , L} , ξ Off is new offspring, ξ k Lbest i and ξ j Lbest i are i th gene of parents: ξ k Lbest and ξ j Lbest , respectively such that k = j . A new offspring is produced for each of K (which is equals to the total number of local blackholes) heuristic crossover. Equation (3) is applied to select best between new offspring ( ξ Off ) and parent with lesser fitness ( ξ j Lbest ). This allows to enhance the diversity of the local population with members of enriched fitness value.
Thereafter, a best among the members of second phase population is nominated as global blackhole ( ξ k Gbest ).
Position update. The position of stars is updated in accordance with ξ k Lbest and ξ k Gbest as depicted in Eq. (4), where ξ k i (t) and ξ k i (t + 1) are the positions of i th star of k th sub population at time instances t and t + 1 , respectively. r 1 and r 2 are random numbers in the range (0, 1) while α l and α g are the attraction forces applied on ξ k i (t) by ξ k Lbest and ξ k Gbest , respectively. The inclusion of local best in position update procedure maintains the diversity of stars by gradually controlling the convergence speed and retains their exploratory behaviour.
The fitness value of all the updated stars is computed by applying Eq. (1). In case, if kth cluster locates a better solution than the existing one, the respective ξ k Lbest is replaced and ξ k Gbest is updated as per the admissibility. SB algorithm is inspired by the natural blackhole phenomenon, where a blackhole consumes everything that enters it including light. DPBHO algorithm works on the concept of a standard blackhole optimization algorithm, wherein none of the candidate solutions is allowed to return from an event horizon (h) area of a blackhole solution delineated by its radius ( R h ). The ratio between fitness value of a local blackhole ( f (ξ k Lbest ) ) and fitness value of its sub-population ( ) computes the event horizon radius ( R h ξ k Lbest ) of the respective blackhole as given in Eq. (5). Similarly, the event horizon radius of a global blackhole ( R h ξ k Gbest ) is evaluated using Eq. (6), is a fitness value of the entire population.
The distance between both solutions is estimated by utilizing the arithmetic difference of their fitness values to confirm that a member solution has reached into the event horizon of the blackhole solution. The distance from local and global blackholes is calculated because each solution gets attracted to these two blackholes. Accordingly, the distance of ith star ( ξ k i ) of kth sub-population from local blackhole ( ξ k Lbest ) and global blackhole is computed in Eqs. (7) and (8), respectively.
If the distance between candidate solution ξ k i and local blackhole ( ξ k Lbest ) is less than or equals to the event horizon radius of ξ k i i.e., R h ξ k Lbest then ξ k i gets collapse which is replaced by a new randomly generated solution to keep uniform number of solutions throughout the simulation. Following the same procedure, ξ k i gets collapse and replaced by a new random solution when it enters into the event horizon radius of the global blackhole R h ξ k Gbest . The operational summary of DPBHO is given in Algorithm 1. www.nature.com/scientificreports/ Step 1 initializes random solutions, and has complexity O(1) .
An illustration. Let there are 9 solutions (or stars) in the initial population ( E 1 ) as shown in Table 9 which are grouped into 3 clusters during the first generation or epoch such that Cluster 1 1 (Table 10), Cluster 1 2 (Table 11), and Cluster 1 3 (Table 12). The fitness of each candidate solution is estimated using Eq. (11) and local best candidate is selected from each cluster. Likewise, ξ Lbest 1 , ξ Lbest 2 , and ξ Lbest 3 constitute local best population ( www.nature.com/scientificreports/ heuristic crossover operation is performed to improve the local best population using Eq. (2) and a global best candidate ( ξ Gbest ) is chosen after fitness evaluation as depicted in Table 14. Further, the population is updated by computing event horizon radius for each cluster as well as a global radius of entire population as observed in Table 15. The distance of each candidate of the first generation population is estimated using Eqs. (7) and (8) to generate the next generation population as illustrated in Table 16, wherein the candidates ξ 1 , ξ 5 , ξ 6 , and ξ 8 , are updated.
Cloud workload resource usage analysis. The cloud workload analysis comprises of two steps: data preparation and multi-layered feed-forward neural network (MFNN) optimization using DPBHO algorithm as described in detail in the following subsections.
Data preparation. MFNN derives intial information for data preparation from Historical Resource Usage database of different clusters { CS 1 , CS 2 , …, CS n } which is updated periodically with live resource usage information as portrayed in block CW-RUA of Fig. 7. Let the received historical resource usage information: { d 1 , d 2 , …, d z }: Table 9. Initial generation population ( E 1 ).
, where ̟ In min and ̟ In max are the minimum and maximum values of the input data set, respectively. The normalized vector is denoted as ̟ In , which is a set of all normalized input data values as ̟ In .
These normalized values (in single dimension) are organized into two dimensional input and output matrices denoted as ̟ In and ̟ Out , respectively as stated in Eq. (10): MFNN optimization. The prepared data values ̟ In are divided into three groups: training (60%), testing (20%), and validation (20%) data, where training data is used to optimize the predictor while testing data is used for evaluating the prediction accuracy over unseen data. During training, MFNN extracts intuitive patterns from actual workload ( ̟ In ) and analyzes z previous resource usage values to predict the (z + 1) th instance of workload in each pass. In the course of training and testing period, the performance and accuracy of the proposed model is evaluated by estimating the Mean Squared Error ( ̟ MSE ) score as fitness function) using Eq. (11); where ̟ AO and ̟ PO are actual and predicted output, respectively 25 . Further, validation data is applied to confirm the accuracy of the proposed prediction model, wherein Mean absolute error ( ̟ MAE ) stated in Eq. (12) is used as a fitness function because it is an easily interpretable and well established metric to evaluate regression models.

2.17435
Global radius 0.15525 Table 16. Second generation population ( E 2 ). www.nature.com/scientificreports/ In the proposed approach, MFNN represents a mapping p-q 1 -q 2 -q 3 -r, wherein p, q 1 , q 2 , q 3 and r are the numbers of neurons in input, hidden#1 , hidden#2 , hidden#3 , and output layer, respectively. Since the output layer has only one neuron, the value of r is constantly 1. The activation function used to update a neuron is stated in Eq. (13), where a linear function ( (̟ ) ) is applied to input layer neurons and sigmoid function ( 1 1+e −̟ ) for the rest of the neural layers.
The training begins with randomly generated N networks of real-numbered vectors denoted as { ξ 1 , ξ 2 , …, ξ N }∈ E , wherein each vector ( ξ i : 1 ≤ i ≤ N ) has size L =((p + 1)×q 1 + q 1 × q 2 + q 2 × q 3 + q 3 × r ). The number of neurons in input layer become p + 1 by reason of consideration of one additional bias neuron. The synaptic or neural weights ( W * ij ) are generated randomly with uniform distribution as shown in Eq. (14), where lb j = −1 and ub j = 1 are the lower and upper bounds, respectively and r is a random number in the range [0, 1].
MFNN is optimized periodically using DPBHO by considering each network vector ( ξ i : 1 ≤ i ≤ N ) as a star, where Eq. (11) is applied as a fitness function and the candidate having least fitness value is nominated as a best candidate both in local and global population optimization phase.

Secure and sustainable VM placement.
Let ω represents a mapping between VMs and servers such that ω kji = 1 , if server S i hosts V j of k th user, else it is 0 as stated in Eq. (15).
The essential set of constraints that must be satisfied concurrently have been formulated in Eq. (16): where C 1 implies j th VM of kth user must be deployed only on one server. The constraints C 2 , C 3 , C 4 state that j th VM's CPU ( V C j ), memory ( V M j ), and bandwidth ( V BW j ) requirement must not exceed available resource capacity of i th server ( S C * i , S M * i , S BW * i ). C 5 specifies that aggregate of the resource capacity request of all the users must not exceed total available resources capacity of the servers altogether. C 6 states that required resource capacity ( R k ) of request r k must not exceed total available resources capacity ( R * ∈ {C * , M * , BW * } ) of VM V j .
The considered load management problem in CDC entangled with multiple constraints seeks to provide a secure and energy-efficient VM placement. Accordingly, a multi-objective function for allocating VMs is stated in Eq. (17): Likewise, the following five distinct models associated to each objective are designed and utilized to establish a secure and sustainable VM placement scheme for CDC.
Security modeling. The sharing of servers among different users is minimized by reducing the allocation of VMs of different users on a common physical server to resist the probability of security attack via co-resident malicious VMs. The probability of occurrence of security attacks is represented as . Let β ki specifies a mapping between user U k and server S i , whereif a server hosts VMs of more than one user then β ki = 1 , otherwise it is 0. The total number of users having their VMs located on server S i are obtained by computing M k=1 β ki . The number of shared server percentile is referred as which is be computed over time-interval { t 1 , t 2 } by using Eq. (18). In contrast to existing secure VM allocation scheme 26 , the proposed security model is capable of reducing co-residential vulnerability threats without any prior information of malicious user and VM. be the CPU, memory, and bandwidth capacity, respectively for i th server and V C j , V Mem j and V RAM j represents CPU, memory, and bandwidth utilization, respectively for j th VM. When S i is active, ϒ i = 1 , otherwise it is 0. CPU, memory and bandwidth utilization of a server can be estimated by applying Eqs. (19)- (21).
Equation (22) calculates resources utilization of server ( RU R S i : {C, Mem, BW} ∈ R ) and complete resource utilization of data centre ( RU CDC ) is determined by applying Eq. (23) where, N is the number of resources observed.
Server power consumption modeling. Consider all the servers based on inbuilt Dynamic Voltage Frequency Scaling (DVFS) energy saving technique 27 which defines two states of CPU: inactive and active state. In active state, CPU works in least operational mode with reduced clock cycle and some internal components of CPU are set inactive. On the other hand, in active state, power consumption depends on the CPU utilization rate and processing application. Therefore, power consumption for a server can be formulated as PW S i for i th server and total power consumption PW CDC for time-interval { t 1 , t 2 } as given in Eqs. (24) and (25), respectively, where RU S i ∈ [0, 1] is resource utilization of server ( S i ).
Power usage effectiveness. This is a very significant metric for measuring power efficiency of CDC. It is expressed as ratio of the total power supply ( PW total S i ) of a server ( S i ) to run its processing equipments and other overheads like cooling and support systems and effective power utilized ( PW utilized S j ) by it. Equations (26) and (27) calculate the power usage effectiveness of a server S i and CDC, respectively.
Carbon foot-print rate. The carbon emission intensity varies in accordance with source of electricity generation. Here, the variables S , W , and N refer to carbon intensity of the energy sources: solar, wind and nonrenewable energy sources, respectively. The carbon intensity is measured in Tons per Mega Watt hour (Tons/ MWh) electricity used. The emission of carbon dioxide in the environment directly depends on the carbon intensity represented as CFR(V j ) and computed by applying in Eq. (28) 4 : www.nature.com/scientificreports/ VM management. The VMs are allocated by utilizing Multi-objective DPBHO (i.e., M-DPBHO) which is an integration of proposed DPBHO algorithm and pareto-optimal selection procedure of Non-dominated Sorting based Genetic Algorithm (NSGA-II) 28 . M-DPBHO comprises of steps: (i) initialization, (ii) evaluation, (iii) selection, and (iv) position update. As illustrated in Fig. 9, X VM allocations represented as stars/solutions: { g 1 , g 2 , …, g X }∈ are randomly initialized, where g is the number of generation. These stars are evaluated using a fitness function η( The population of stars is distributed into K sub-populations and local best blackholes ( k Lbest ) are selected by estimating the fitness value using pareto-optimal selection procedure of NSGA-II. Thereafter, a second phase population is generated with the help of heuristic crossover [using Eq. (2)]. Similar to the local phase, a global best solution ( k Gbest ) is observed from the second phase population using pareto-optimal procedure. Therefore, to select the best VMP solution, a pareto-front selection procedure of NSGA-II is invoked that concedes all the objectives non-dominantly. A solution ( i ) dominates other solution ( j ), if its fitness value is better than that of j on atleast one objective and same or better on other objectives. The position update step of DPBHO [including Eq. (4)] along with Eqs. (5) and (6) for computing event horizon radius of local and global blackholes, respectively while Eqs. (7) and (8) are used to determine distance of a candidate solution from a local and global blackhole, respectively) is invoked to regenerate or update the existing population. Let a user job request ( ) is distributed into sub-units or tasks such as { τ 1 , τ 2 , …, τ z }∈ . Eq. (29) is employed to select an appropriate VM for user application execution, represents small, medium, large and extra-large types of VM respectively, having capacity of resources R ∈ {CPU, memory} depending on their particular type, and τ R i represents resource utilization of i th task. If the maximum resource requirement of a task from i th task is lesser or equals to the resource capacity of V S , then small type of VM is assigned to the task.

Background and discussion
The background study deals with discussion of several approaches proposed thus far for cloud resource provisioning using meta-heuristic approaches 29 and machine learning algorithms for cloud workload analysis 30 . An online prediction based multi-objective load-balancing (OP-MLB) framework is proposed in 18 for energy-efficient data centres. The forthcoming load on VMs is estimated using an Auto Adaptive Differential Evolutionary (AADE) trained neural network-based prediction system to determine the future resource utilization of the servers proactively. Also, it detected an overload condition on each server and tackled it by migrating VMs of highest resource capacity from overloaded server to an energy-efficient server machine. The VM placement and migration are executed using a non-dominated sorting with genetic algorithm based multi-objective algorithm for minimization of power consumption. A distributive UPS topology at server-level and rack-level based framework for cloud resource management is proposed in 14 . This framework established VM placement, appropriate time of battery charging and discharging, and selected a battery that minimizes the peak demands and monthly electricity bill. The VM requests are scheduled by developing a Slack and Battery Aware (SBA) placement based on power state of the servers, resource utilization, and the amount of energy stored in server batteries. It helped to reduce the number of active servers and maximize the accessible stored energy to be utilized during peak demands. Dabbagh et al. 21 presented an integrated energy-efficient VM placement and migration framework for cloud data centre. It applied a Wiener filter with safety margin (WP-SM) based prediction for estimation of the number of VM requests and the future resource requirement. These predicted values are used to allow only the required number of physical machines in active state and helps in achieving a substantial energy saving and resource utilization. Kaur et al. 4 have presented a Boruta algorithm driven multi-objective optimization scheme based job scheduling (BM-JS) along with energy-efficient VM placement for sustainable cloud environment. Specifically, they have classified upcoming workload using Boruta algorithm and sensitive hashing-based support vector machines approach followed by Greedy scheme based VM placement to reduce carbon footprint and www.nature.com/scientificreports/ energy consumption. A secure and multi-objective VM placement (SVMP) framework is proposed in 20 , where an integrated version of whale optimization algorithm and non-dominated sorting based genetic algorithm is implemented to attain multiple objectives concurrently. Marahatta et al. 17 have proposed a failure management aware cloud resource distribution approach named Prediction based Energy-aware Fault-tolerant Scheduling scheme (PEFS). Specifically, a deep neural network based failure predictor is utilized to differentiate between failure prone and non-failure prone tasks. Three replicas are executed for failure-prone tasks on separate servers to prevent redundant execution on the same server while non-failure tasks execute normally. Nguyen et al. 15 addressed the VM consolidation problem by adopting multiple usage prediction by applying multiple linear regression to estimate the relationship between the input variables and the output for energy efficient data centres. This work estimated overloaded host detection with multiple usage prediction (OHD-MUP) and underloaded host detection with multiple usage prediction (UHD-MUP) and balanced load by migrating selected VMs from overloaded servers to energy-efficient server. A metaheuristic technique-based Fuzzy C-means clustering (MTFC) mechanism is proposed in 31 to locate most promising clusters according to the users' Quality-of-Service (QoS) requirement. Further, a gray wolf optimization is applied to make an appropriate scaling decision for cloud resource provisioning. Tarahomi et al. 32 have proposed a micro-genetic approach (MGA) to present power-efficient resource distribution of physical resources for sustainable cloud services. The micro-genetic algorithm helps to select suitable destinations for VMs amongst physical hosts. Likely, a resource elasticity management issue is resolved in 33 by proposing an elastic controller based on colored Petri Nets (EC-CPN) that assists in automatic handling of over-/under-provisioning of resources. A co-location resistant VM placement method, "Previously Co-Located Users First" (PCUF) is presented in 16 where VMs are placed and co-located according to their user identities of previous allocation in order to reduce the co-residency attacks. A Link Based Virtual Resource Management (LVRM) algorithm is proposed in 22 which employed a mapping of virtual links and nodes for reduction of their impact on request execution time to minimize the number of active servers. It assigned a highest priority to the virtual link having maximum network bandwidth to minimize the execution time of request. Also, it assigned multiple VMs to a single server by applying Dijkstra algorithm for selection of the substrate path between two servers so as to enhance the request execution rate. To meet dynamic demands of the future applications, an energy-efficient resource provisioning framework is developed in 19 . This framework addressed the challenges including resource wastage, degradation of performance and QoS by comparing the application's predicted resource requirement with resource capacity of VMs and consolidating entire load on the minimum number of servers. An online multi-resource feed-forward neural network (OM-FNN) is developed and optimized with Tri-adaptive Differential Evolutionary (TaDE) algorithm to forecast the multiple resource demands and predicted VMs are placed on energy-efficient servers. This integrated approach optimized resource utilization and energy consumption.
Majority of the existing works have investigated sustainability of CDCs with respect to energy consumption only and few others have studied resource utilization while ignoring carbon emission, power usage efficiency, www.nature.com/scientificreports/ which are essential credentials to be considered during sustainable resource management. Further, none of the prior works have considered security along with sustainability during VM consolidation. In the light of the existing approaches, the proposed SaS-LM model addresses multiple objectives associated to sustainability of CDCs as well as considers security of users' applications under processing in real-time. The proposed DPBHO algorithm training based workload analyser learns resource usage patterns and characteristics with precise accuracy to allow enhanced utilization of servers, PUE, and reduced carbon emission. Also, multi-objective DPBHO based VM management consolidates VMs on most efficient servers which caters multiple objectives for enhanced sustainability of CDCs with usage of green power supply while meeting QoS constraints simultaneously. Table 17 compares the SaS-LM model with state-of-the-art approaches thoroughly.

Conclusion and future work
A novel SaS-LM model is proposed to provide a pareto-optimal solution for secure and sustainable workload management in the green cloud environment. The model incorporates a newly developed DPBHO evolutionary optimization algorithm for neural network-based resource usage estimation. Further, Multi-objective DPBHObased real-time VM placement and management are presented to serve the perspectives of both the cloud user and service provider, concurrently. There is a substantial reduction in security attacks, carbon emission, and power consumption with an improvement in resource utilization and PUE. The achieved results show superiority of SaS-LM model compared to the existing state-of-the-art approaches. Also, a trade-off is observed revealing that sustainability improves at the cost of security and vice-versa. In the future, the proposed model can be extended by prioritizing the objectives as per the dynamic requirement, adding objectives like trust and reliability-based VM allocation scheme.

Data availability
The dataset used and/or analysed during the current study available from the corresponding author on reasonable request.